43 research outputs found

    Classification of microarrays; synergistic effects between normalization, gene selection and machine learning

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Machine learning is a powerful approach for describing and predicting classes in microarray data. Although several comparative studies have investigated the relative performance of various machine learning methods, these often do not account for the fact that performance (e.g. error rate) is a result of a series of analysis steps of which the most important are data normalization, gene selection and machine learning.</p> <p>Results</p> <p>In this study, we used seven previously published cancer-related microarray data sets to compare the effects on classification performance of five normalization methods, three gene selection methods with 21 different numbers of selected genes and eight machine learning methods. Performance in term of error rate was rigorously estimated by repeatedly employing a double cross validation approach. Since performance varies greatly between data sets, we devised an analysis method that first compares methods within individual data sets and then visualizes the comparisons across data sets. We discovered both well performing individual methods and synergies between different methods.</p> <p>Conclusion</p> <p>Support Vector Machines with a radial basis kernel, linear kernel or polynomial kernel of degree 2 all performed consistently well across data sets. We show that there is a synergistic relationship between these methods and gene selection based on the T-test and the selection of a relatively high number of genes. Also, we find that these methods benefit significantly from using normalized data, although it is hard to draw general conclusions about the relative performance of different normalization procedures.</p

    Immortalization of T-cells is accompanied by gradual changes in CpG methylation resulting in a profile resembling a subset of T-cell leukemias

    Get PDF
    We have previously described gene expression changes during spontaneous immortalization of T-cells, thereby identifying cellular processes important for cell growth crisis escape and unlimited proliferation. Here, we analyze the same model to investigate the role of genome-wide methylation in the immortalization process at different time points pre-crisis and post-crisis using high-resolution arrays. We show that over time in culture there is an overall accumulation of methylation alterations, with preferential increased methylation close to transcription start sites (TSSs), islands, and shore regions. Methylation and gene expression alterations did not correlate for the majority of genes, but for the fraction that correlated, gain of methylation close to TSS was associated with decreased gene expression. Interestingly, the pattern of CpG site methylation observed in immortal T-cell cultures was similar to clinical T-cell acute lymphoblastic leukemia (T-ALL) samples classified as CpG island methylator phenotype positive. These sites were highly overrepresented by polycomb target genes and involved in developmental, cell adhesion, and cell signaling processes. The presence of non-random methylation events in in vitro immortalized T-cell cultures and diagnostic T-ALL samples indicates altered methylation of CpG sites with a possible role in malignant hematopoiesis

    Decreased telomere length in children with cartilage-hair hypoplasia

    Get PDF
    Background Cartilage-hair hypoplasia (CHH) is an autosomal recessive chondrodysplasia caused by RMRP (RNA component of mitochondrial RNA processing endoribonuclease) gene mutations. Manifestations include short stature, variable immunodeficiency, anaemia and increased risk of malignancies, all of which have been described also in telomere biology disorders. RMRP interacts with the telomerase RT (TERT) subunit, but the influence of RMRP mutations on telomere length is unknown. We measured relative telomere length (RTL) in patients with CHH, their first-degree relatives and healthy controls and correlated RTL with clinical and laboratory features. Methods The study cohort included 48 patients with CHH with homozygous (n=36) or compound heterozygous RMRP mutations (median age 38.2 years, range 6.0-70.8 years), 86 relatives (74 with a heterozygous RMRP mutation) and 94 unrelated healthy controls. We extracted DNA from peripheral blood, sequenced the RMRP gene and measured RTL by qPCR. Results Compared with age-matched and sex-matched healthy controls, median RTL was significantly shorter in patients with CHH (n=40 pairs, 1.05 vs 1.21, p=0.017), but not in mutation carriers (n=48 pairs, 1.16 vs 1.10, p=0.224). RTL correlated significantly with age in RMRP mutation carriers (r=-0.482, p <0.001) and non-carriers (r=-0.498, p Conclusions Telomere length was decreased in children with CHH. We found no correlation between RTL and clinical or laboratory parameters.Peer reviewe

    An integrated transcriptome analysis in T-cell acute lymphoblastic leukemia links DNA methylation subgroups to dysregulated TAL1 and ANTP homeobox gene expression

    Get PDF
    Classification of pediatric T-cell acute lymphoblastic leukemia (T-ALL) patients into CIMP (CpG Island Methylator Phenotype) subgroups has the potential to improve current risk stratification. To investigate the biology behind these CIMP subgroups, diagnostic samples from Nordic pediatric T-ALL patients were characterized by genome-wide methylation arrays, followed by targeted exome sequencing, telomere length measurement, and RNA sequencing. The CIMP subgroups did not correlate significantly with variations in epigenetic regulators. However, the CIMP+ subgroup, associated with better prognosis, showed indicators of longer replicative history, including shorter telomere length (P = 0.015) and older epigenetic (P <0.001) and mitotic age (P <0.001). Moreover, the CIMP+ subgroup had significantly higher expression of ANTP homeobox oncogenes, namely TLX3, HOXA9, HOXA10, and NKX2-1, and novel genes in T-ALL biology including PLCB4, PLXND1, and MYO18B. The CIMP- subgroup, with worse prognosis, was associated with higher expression of TAL1 along with frequent STIL-TAL1 fusions (2/40 in CIMP+ vs 11/24 in CIMP-), as well as stronger expression of BEX1. Altogether, our findings suggest different routes for leukemogenic transformation in the T-ALL CIMP subgroups, indicated by different replicative histories and distinct methylomic and transcriptomic profiles. These novel findings can lead to new therapeutic strategies.Peer reviewe

    DNA methylation holds prognostic information in relapsed precursor B-cell acute lymphoblastic leukemia

    Get PDF
    Background: Few biological markers are associated with survival after relapse of B-cell precursor acute lymphoblastic leukemia (BCP-ALL). In pediatric T-cell ALL, we have identified promoter-associated methylation alterations that correlate with prognosis. Here, the prognostic relevance of CpG island methylation phenotype (CIMP) classification was investigated in pediatric BCP-ALL patients. Methods: Six hundred and one BCP-ALL samples from Nordic pediatric patients (age 1-18) were CIMP classified at initial diagnosis and analyzed in relation to clinical data. Results: Among the 137 patients that later relapsed, patients with a CIMP-profile (n = 42) at initial diagnosis had an inferior overall survival (pOS(5years) 33%) compared to CIMP+ patients (n = 95, pOS(5years) 65%) (p = 0.001), which remained significant in a Cox proportional hazards model including previously defined risk factors. Conclusion: CIMP classification is a strong candidate for improved risk stratification of relapsed BCP-ALL.Peer reviewe

    Sixteen-year longitudinal evaluation of blood-based DNA methylation biomarkers for early prediction of Alzheimer\u27s disease

    Get PDF
    BACKGROUND: DNA methylation (DNAm), an epigenetic mark reflecting both inherited and environmental influences, has shown promise for Alzheimer\u27s disease (AD) prediction. OBJECTIVE: Testing long-term predictive ability ( \u3e 15 years) of existing DNAm-based epigenetic age acceleration (EAA) measures and identifying novel early blood-based DNAm AD-prediction biomarkers. METHODS: EAA measures calculated from Illumina EPIC data from blood were tested with linear mixed-effects models (LMMs) in a longitudinal case-control sample (50 late-onset AD cases; 51 matched controls) with prospective data up to 16 years before clinical onset, and post-onset follow-up. Novel DNAm biomarkers were generated with epigenome-wide LMMs, and Sparse Partial Least Squares Discriminant Analysis applied at pre- (10-16 years), and post-AD-onset time-points. RESULTS: EAA did not differentiate cases from controls during the follow-up time (p \u3e 0.05). Three new DNA biomarkers showed in-sample predictive ability on average 8 years pre-onset, after adjustment for age, sex, and white blood cell proportions (p-values: 0.022- \u3c 0.00001). Our longitudinally-derived panel replicated nominally (p = 0.012) in an external cohort (n = 146 cases, 324 controls). However, its effect size and discriminatory accuracy were limited compared to APOE 4-carriership (OR = 1.38 per 1 SD DNAm score increase versus OR = 13.58 for 4-allele carriage; AUCs = 77.2% versus 87.0%). Literature review showed low overlap (n = 4) across 3275 AD-associated CpGs from 8 published studies, and no overlap with our identified CpGs

    DNA Methylation Adds Prognostic Value to Minimal Residual Disease Status in Pediatric T-Cell Acute Lymphoblastic Leukemia

    Get PDF
    Background. Despite increased knowledge about genetic aberrations in pediatric T-cell acute lymphoblastic leukemia (T-ALL), no clinically feasible treatment-stratifying marker exists at diagnosis. Instead patients are enrolled in intensive induction therapies with substantial side effects. In modern protocols, therapy response is monitored by minimal residual disease (MRD) analysis and used for postinduction risk group stratification. DNA methylation profiling is a candidate for subtype discrimination at diagnosis and we investigated its role as a prognostic marker in pediatric T-ALL. Procedure. Sixty-five diagnostic T-ALL samples from Nordic pediatric patients treated according to the Nordic Society of Pediatric Hematology and Oncology ALL 2008 (NOPHO ALL 2008) protocol were analyzed by HumMeth450K genome wide DNA methylation arrays. Methylation status was analyzed in relation to clinical data and early T-cell precursor (ETP) phenotype. Results. Two distinct CpG island methylator phenotype (CIMP) groups were identified. Patients with a CIMP-negative profile had an inferior response to treatment compared to CIMP-positive patients (3-year cumulative incidence of relapse (CIR3y) rate: 29% vs. 6%, P = 0.01). Most importantly, CIMP classification at diagnosis allowed subgrouping of high-risk T-ALL patients (MRD >= 0.1% at day 29) into two groups with significant differences in outcome (CIR3y rates: CIMP negative 50% vs. CIMP positive 12%; P = 0.02). These groups did not differ regarding ETP phenotype, but the CIMP-negative group was younger (P = 0.02) and had higher white blood cell count at diagnosis (P = 0.004) compared with the CIMP-positive group. Conclusions. CIMP classification at diagnosis in combination with MRD during induction therapy is a strong candidate for further risk classification and could confer important information in treatment decision making. (C) 2016 Wiley Periodicals, Inc.Peer reviewe

    Normalization and analysis of high-dimensional genomics data

    No full text
    In the middle of the 1990’s the microarray technology was introduced. The technology allowed for genome wide analysis of gene expression in one experiment. Since its introduction similar high through-put methods have been developed in other fields of molecular biology. These high through-put methods provide measurements for hundred up to millions of variables in a single experiment and a rigorous data analysis is necessary in order to answer the underlying biological questions. Further complications arise in data analysis as technological variation is introduced in the data, due to the complexity of the experimental procedures in these experiments. This technological variation needs to be removed in order to draw relevant biological conclusions from the data. The process of removing the technical variation is referred to as normalization or pre-processing. During the last decade a large number of normalization and data analysis methods have been proposed. In this thesis, data from two types of high through-put methods are used to evaluate the effect pre-processing methods have on further analyzes. In areas where problems in current methods are identified, novel normalization methods are proposed. The evaluations of known and novel methods are performed on simulated data, real data and data from an in-house produced spike-in experiment

    MC-Normalization: A Novel Method for Dye-Normalization of Two-Channel Microarray Data

    No full text
    Pre-processing plays a vital role in two-color microarray data analysis. An analysis is characterized by its ability to identify differentially expressed genes (its sensitivity) and its ability to provide unbiased estimators of the true regulation (its bias). It has been shown that microarray experiments regularly underestimate the true regulation of differentially expressed genes. We introduce the MC-normalization, where C stands for channel-wise normalization, with considerably lower bias than the commonly used standard methods.
    corecore